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Abstract: A class of estimating functions is introduced for the regression 
parameter of the Cox proportional hazards model to allow unknown failure 
statuses on some study subjects. The consistency and asymptotic normality 
of the resulting estimators are established under mild conditions. An adap- 
tive estimator which achieves the minimum variance-covariance bound of the 
class is constructed. Numerical studies demonstrate that the asymptotic ap- 
proximations are adequate for practical use and that the efficiency gain of the 
adaptive estimator over the complete-case analysis can be quite substantial. 
Similar methods are also developed for the nonparametric estimation of the 
survival function of a homogeneous population and for the estimation of the 
cumulative baseline hazard function under the Cox model. 



1. Introduction 

Let (Tj,Cj, Z!) (« = l,...,n) be n independent replicates of the random vector 
(T,C,Z'), where T and C denote the failure and censoring times, and Z denotes 
a p x 1 vector of possibly time-varying covariates. The observations consist of 
(Xi, Si, Zl) (i — 1, ...,n), where Xi = Ti A C\ and 6i = 1(t 4 <C7 4 )- Assume that 
Tj and are conditionally independent given Z^. 

The widely-used Cox semiparametric regression model Q postulates that, con- 
ditional on Z(t), the hazard function X(t) for T takes the form e^° z ^ Xo(t), where 
(3o is a p-dimensional regression parameter and Ao(-) is an unspecified baseline haz- 
ard function. The maximum partial likelihood estimator j3f for (3q is obtained by 
maximizing 

(L1) ^-SIsl.w.^-)) • 
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or by solving {S(/3) — 0}, where 

(1.2) ^) = Ey o 2 ; =1 i (Xj ^(0 j**c*<«>- 

Under suitable regularity conditions, n -1 / 2 S(0a) — > A/"(0, V) and n 1 / 2 (/3j — /3 ) — » 
A^O,]/" 1 ), where V = - lim^oo n- x dS{p )/dP Q. These asymptotic properties 
provide the basis for making inference about /3q ■ For the one-dimensional (dichoto- 
mous) Z, the nonparametric test based on S(0) for testing /3q = has been better 
known as the (two-sample) log rank test. 

The estimation of the cumulative hazard function A(i) = J Q X(s)ds and the 
survival function F(t) — e~ A< ^ is also of interest. In the one-sample case, where no 
covariates are modeled, A(i) is commonly estimated by the Nelson- Aalen estimator 



(1.3) A NA (t) 



and the corresponding survival function estimator F^A{t) — e AjvA (*) is asymptot- 
ically equivalent to the well-known Kaplan-Meier estimator 

(i.4) iwi)= n \ l 



Xi<t 



Motivated by the Nelson- Aalen estimator, Breslow suggested that the cumulative 
baseline hazard function Aq(£) = Jl Ao(s)ds under the Cox model be estimated by 



(1-5) A B (t) 



ElLi ^^i(x,< s ) 

v^n -i 8'.Zh(s) 

Ej=i l(^> s )e^ 



Both n 1 / 2 {AjvA(-) — A(-)}and n 1 / 2 {As(-) — A (-)} converge weakly to zero-mean 
Gaussian processes P, H, H, 14 1 . 



All of the aforementioned procedures assume complete measurements on the 
failure indicators Si (i — 1, ...,n). In many applications, however, the values of 
{Si} are missing for some study subjects. We shall distinguish between two types of 
missingness. For Type I missingness, {Si} are missing completely at random among 
all subjects. For Type II missingness, {Si} take value for some subjects and are 
missing completely at random among the remaining subjects. By missing completely 
at random, we mean that the missing mechanism is independent of everything else. 
The following two examples demonstrate how such missingness arises in practice. 

Example 1. (Type I missingness). Suppose that a series system has two indepen- 
dent components I and II and let T and C represent times to failure of I and II 
respectively. The potential observations for a single system consist of A = T A C 
and S = 1(t<c)- Suppose that a large number of systems are operated until failure. 
Also suppose that the diagnosis of a system to identify which component failed is 
so costly that it can only be done for a random sample of the systems under testing. 
Thus we observe all {Xi} and a random subset of {Si}. 

Example 2. (Type II missingness). In the medical study, investigators are often 
interested in the time to death attributable to a particular disease, in which case 
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Si = 1 if and only if the ith subject died from that disease. Typically, the causes of 
death are unknown for some deaths because it requires extra efforts (e.g., performing 
autopsies or obtaining death certificates) to gather such information. Thus the 
values of {Si} may be missing among the deaths. On the other hand, if the ith 
subject has been withdrawn from the study before its termination or is still alive 
at the end of the study, then Si must be 0. Hence, we have Type II missingness 
provided that the deaths with known causes are representative of all the subjects 
who died. 

The most commonly adopted strategy for handling missing values is the complete- 
case analysis, which totally disregards all the subjects with unknown failure sta- 
tuses. This approach is valid under Type I missingness; however, it can be highly 
inefficient if there is heavy missingness. For Type II missingness, the complete-case 
analysis does not even yield consistent estimators. 

There have been a few articles on estimating the survival distribution of a homo- 
geneous population in the presence of missing failure indicators. Notably, [5| used 
the nonparametric maximum likelihood method in conjunction with the EM algo- 
rithm to derive an estimator that is analogous to the Kaplan-Meier estimator (1.4). 
According to [13], however, the maximum likelihood as well as the self-consistent 
estimators are in general nonunique and inconsistent. Two alternative estimators 
are proposed in [10( under Type I missingness. As will be discussed in Section 3, 
these estimators have some undesirable properties. On the more challenging regres- 
sion problem, there has been little progress. The only solution seems to have been 
the modified log rank test for Example 1.2 proposed [||. As admitted by these au- 
thors, they made some rather unrealistic assumptions, including the independence 
between the covariate and the causes of death not under study as well as the pro- 
portionality of the hazard rate for the cause of interest and that of the other causes. 
On the other hand, further developments along the line of efficient estimation can 
be found in 11, T^,[lB|. Furthermore, 17 1 deals with the additive hazards regression 



model. 

This paper provides a treatment of the Cox regression analysis and the sur- 
vival function estimation under both types of missingness. In the next section, we 
introduce a class of estimating functions for po under Type I missingness which 
incorporates the partial information from the individuals with unknown Si. The 
consistency and asymptotic normality of the resulting estimators are established. A 
simple adaptive estimator is constructed which has the smallest variance-covariancc 
matrix among the proposed class of estimators including the complete-case estima- 
tor. Simulation studies show that the adaptive estimator is suitable for practical 
use. Section 3 deals with the survival function estimation under Type I missingness. 
For the one-sample case, we derive an adaptive estimator which offers considerable 
improvements over the complete-case and Lo's estimators [l3|. Estimation of the 
cumulative baseline hazard function for the Cox model is also studied. In Section 
4, we apply the ideas developed in Sections 2 and 3 to Type II missingness to 
obtain consistent estimators with similar optimality properties. Note that some of 
the technical developments there are streamlined and may be traced to a technical 
report [tJ- We conclude this paper with some discussions in Section 5. 



2. Cox regression under Type I missingness 



In this section, we propose estimating functions for the parameter vector (3q which 
utilize the partial information from the subjects with unknown failure indicators. 
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The asymptotic properties of these functions and the resulting parameter estimators 
are studied in detail. Throughout the paper, we shall make the following assump- 
tion, which is satisfied in virtually all practical situations. 



Boundedness condition. The covariate processes Zi(-) — {Zn(-), . . . , Zi p (-)}' ( 



1, . . . , n) are of bounded variation with a uniform bound, i.e., there exists K > 
such that for all i. 



r||z y (0)i + ^ oo |^(t)ij <k. 



E 



Let £i indicate, by the value 1 vs. 0, whether Si is known or not. Under Type I 
missingness, the data consist of i.i.d random vectors (Xi, £i<5j, Z[) (i = 1, . . . , n), 
where & is independent of (Xi, Si, for every i. Write p = P(£i = 1). 

Note that the partial likelihood score function (1.2) is the sum over all the 
observed failure times of the differences between the covariate vectors of the subjects 
who fail and the weighted averages of the covariate vectors among the subjects under 
observation. In view of this fact, we introduce the following estimating function: 



(2.1) 



i=l 



{Zi(t)-Z(fi,t)}tidN?{t), 



where Z(0,t) = E"=x Hx^'^Z^/^x l^>t)^ ZjW and JV?(i) = 
5il(Xi<t)- In the sequel, we shall also use the notation Yi(t) — l(Xi>t), Ni(t) = 
l( Xt <t) and N?{t) = (1 - 5i)\xi<t)- N «te that {iV", Nf} may not be fully observ- 
able whereas {Ni,£iN™ , £i-/Vf} are always observed. Another way of deriving (2.1) 
is to modify the partial likelihood function (1.1) by omitting the factors for which 
the Si are missing. Then Si (0) can be obtained by the usual way of differentiating 
the "log-likelihood function" . 

Theorem 2.1. Let Sx(0, t) = EJL X /„* {Z^s) - Z(0, s)} &dN?(s). 

(i) The process n~ x / 2 S\(0q, •) converges weakly to a zero-mean Gaussian mar- 
tingale with variance function 



(2.2) 



Vi(t)=E 



[ We)- 2(00,8)}®* bdNKs) 
h 



where z{0,t) = E {Y^tfe?* 1 ® Z t (t)\ /E {^(tJe^W}. 

(ii) Define as the root of {S\(0) =0}.IfVi = Vi(oo) is nonsingular, then 
n V 2 (0-0o)±Af(O,Vr 1 ). 

Remarks. (1) It is simple to show that V\ = pV, where V is the limiting covariance 
matrix for 0f defined in Section 1. By the arguments of [l[, V\(t) can be consistently 
estimated by 



Vi(t) = 




Yr 0=1 Y j {s)e^ z ^ ) Zf\s) 



Z^{0,s)}iidNt{s). 



(2) The nonsingularity of V\ is a very mild assumption and it is true in practically 
all meaningful situations. 
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(3) The difference between the process Si((3,t) and the partial likelihood score 
process under the complete-case analysis 

S d (0,t) = J2 / {Zi(s)-Z d (f3,s)}^dNr(s), 

where Z d (J3,t) = £" =1 €j Y j [t)e^' z ^ Z^t)/ £™ =1 ^Y s (i)e"' z *W, is that the sub- 
jects with unknown failure indicators are included in the calculation of Z ', but not 
in that of Z d . It is somewhat surprising to note that Sd((3, •) and the correspond- 
ing estimator (3d have the same asymptotic distributions as those of Si((3, •) and 
(3, respectively, even though Z{(3, t) is a more accurate estimator of z(/3, £) than 
Zd(f3,t) is. As will be seen in the proof of Theorem 2.1, however, S d ((3, •) and /3d 
themselves are not asymptotically equivalent to S\ (/?,-) and /3. Simulation results 
to be reported later in the section reveal that (3 tends to be slightly more efficient 
than (3d for small and moderate-sized samples. 

(4) The use of S\((3) may incur substantial loss of information, especially when 
p is small, since the asymptotic distribution of (3 is the same as that of (3d, which 
only uses data with known failure indicators. Indeed, the purpose of this section 
is to construct a new estimator that combines S\{(3) with an estimating function 
utilizing the counting processes iVj(-) associated with £j = 0. In this connection, 
the estimating function S\ plays only a transitional role. 

Proof of Theorem 2.1. For notational simplicity, assume p = 1. Let Mj(t) — 
iV"(t) — Jg Yi{s)e^° Zi ^ Xo(s)ds, which are martingale processes with respect to 
an appropriate er-filtration 1]. Decompose Si((3o,t) into two parts 



i=l 



Si(/M) = y / {^(s)-Z(/3 ,s)}&dM 4 ( 







E 

i=l 



t 

(6 " P) / {Z l (s) - Z((3 , «)} e^«Y;( S )Ao(s)ds 



n 



= Sii(i) + Si 2 (t), say. 

Now n~ 1 / 2 5'ii(-) is a martingale. By the arguments of [l(, n _1 / 2 S'ii(t) is asymp- 
totically equivalent to rT^Snit) = n" 1 / 2 YX^i So W t (s)^ l dM l (s), where W l (s) = 
Zi(s) — z(/?o,s), and converges weakly in X>[0, oo) to a Gaussian martingale with 
variance function Vi(t). Note that the tightness of n _1/ ' 2 S'ii(-) at oo can be easily 
handled along the lines of [f|. From Lemma l(i) given at the end of the section, 
n _1 / 2 5i2(t) is also tight and is asymptotically equivalent to 



n- 1 /^ 12 (*)=n- 1 /2V(e i -p) / W t (s)e^) Yl 



(s)X {s)ds. 



Hence, n 1 / 2 5i(/?o, •) is asymptotically equivalent to n 1//2 |<Sii(-) + ■S'12 (•) j- , which 
converges weakly to a zero-mean Gaussian process with covariance function at (t, if) 
that can be shown to be equal to 

n^E [{S u (t) + S 12 (t)}{Sn(t') + S 12 (f)}] = Vi(t A if). 

To prove part (ii) of the theorem, note that —n~ 1 dS\((3)/d(3 is positive (positive 
definite for p > 1) and converges to E {Z x (t) - z((3, t)} 2 £idN?(t) . Thus, (3 
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is uniquely defined and the arguments of entail the convergence of n 1 / 2 (/3 — /3q) 
to A^O.Vf 1 ). □ 

To incorporate the partial survival information from those subjects with missing 
Si, it is natural to consider the counting processes (1 — ^i)Ni(-) and to subtract off 
the jumps due to censoring. In this connection, we introduce 
(2.3) 

n ft 

&C9,*)=53 / {Zi{s)-Z(p,8)}{(l-S i )dN i (8)-pr 1 (l-p)t i dN?(8)}, 



where p = n^EjLifi. n °t in S that E {(1 - ^N^t) - p-^1 - p)^N^(t)} 
= E{N?(t)}. 

Theorem 2.2. The process n _1 / 2 S2(/3oj ') is asymptotically independent ofn~ x / 2 x 
Si(A)j') converges weakly to a zero-mean Gaussian process with covariance 
function 

V 2 (t, t') = E j jT* W® 2 ( S )(1 - fi)diVf 

+/T 1 (1 - p)E \{Nf z (t) - EN? z (t)} {N? z {t') - EN? z (t')}' 
where Nf z (t) = J* {Z^s) - z(f3 , s)} dN?(s) (i = 1, . . . , n). 

Proof. Again assume p = 1. Since p ~ p = O p (n -1 / 2 ), by the usual delta method, 

&(#>,*) = V / {Zi(s)-Z(A),s)} (l-6)dMi(s) 
_i=i 70 

+ E / l^( s ) - ^o, *)} i 1 - 6 - C 1 - ^/e^^l^Ao^)^ 

n „t 

+ J2 {Z l {s)-Z{^s)}dN^s)p- 1 {p-i l ) 
j=i J o 

+ E f ~ Z{^s)} dNf{ s )p-\p - P ) 

i=i 70 
+ r n (t) 

= 5 2 i(t)+S , 22 (i)+r n (t), say. 

Here the remainder term r n is uniformly negligible in the sense that sup t |j" n (i)| = 
o p (n 1 / 2 ). Note that (t) is the same as S\(t) except that {^} there are replaced 
by {1 — Thus S , 2i(i) is tight in X>[0, oo) and is asymptotically equivalent to 

S2i(t)=J2 Wi(s)(l - Zi)dMi(s) + V / ^(s)(p-^)e ftZi(s) ^(s)Ao(s)ds. 

By Lemma 1(h), <S l 22(i) is asymptotically equivalent to 

n 
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By writing 



(sW(*)(p-&), 



S2i{t)=J2 Wi(s)(l - p)dMi(s) + V / Wi 
i=1 Jo i=1 Jo 

we can show that, for any t and t', 

(2.4) E {s 2 i(t)S 22 (t')} =0. 

Thus n -1 / 2 |/S2i( - ) + S22O)} converges weakly to a zero-mean Gaussian process 
with V 2 as its covariance function. 

Similar to (2.4), E {S n (t) + Si 2 (t)}S 22 (t') = for any t and t'. Thus to prove 
the asymptotic independence between Si and S 2 , it suffices to show 



E 



{§u(t) + Si 2 (t)}S 2 l(t') 



0. 



To this end, we can apply the same covariance calculation as employed in the proof 
of Theorem 2.1 to show that 



E 



{Sn(t) + S 12 (t)}S 21 (t') 
= nE 



{l Wl ^ ldMl ^J Wi(s')(l - Zi)dMi(s') 



0. 



□ 



By combining Si and S2, more efficient estimators of flo may be obtained. Specif- 
ically, given a p x p matrix D, we can define /3 as a solution to 



(2.5) 



S 1 (J3) + DS 2 (J3) = 0. 



Theorem 2.3. Suppose that {pV + (1 — p)DV} is nonsingular. LetV 2 — ^(00,00). 
Then n 1 / 2 ^ - fo) ^ Af(0, £(£>)), where 

(2.6) £(£>) = {pV + (1 - p) DVy 1 ( P V + DV 2 D') {pV + (1 - p^D 1 }' 1 . 
In particular, D* = (1 — p)W 2 x yields 

V{D*) = {pV + {l-pfVV 2 - 1 Vy 1 
and is optimal in the sense that S(-D) — £(£>*) is nonnegative definite for any D. 

Remarks. (1) Let V CZ = E {N? z (oo) - EN? z (oo)}® 2 . Then V 2 = (1 - p)V + 
p-\l - p)V CZ . For p = 1, D* = V/(V + p^Vcz) and £(£>*) = (V + p^Vcz)/ 
{V(V + Vcz)}- This variance will be close to the ideal V 1 if either p is close to 1 
(light missingncss) or Vcz is close to zero (light censorship). 

(2) A consistent estimator for £(D) may be obtained by replacing p, V and V 2 
in (2.6) by p, V{[3) and V 2 0), where 



E 



1 " f c 
™ T- / 

i=l i=1 Jo 



E; =1 ^(t)^"^ 2 (t) 

£?=i*i(*K z ' (t) 
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Table 1 

Monte Carlo estimates for the sampling means and variances of four estimators of /3q and for 
the sizes of the corresponding 0.05-level wald tests for testing Hq : fio = 
under the Model X(t\Z) = 1 

20% Censoring 50% Censoring 70% Censoring 

p Estimator Mean Var. Size Mean Var. Size Mean Var. Size 



0.8 


Pf 


-0.001 


0.015 


0.056 


-0.001 


0.023 


0.054 


0.002 


0.038 


0.052 




0d 


-0.001 


0.018 


0.053 


-0.002 


0.030 


0.052 


0.002 


0.049 


0.049 




P 


-0.001 


0.018 


0.054 


-0.001 


0.029 


0.053 


0.002 


0.049 


0.050 




P* 


-0.001 


0.015 


0.056 


-0.002 


0.027 


0.055 


0.002 


0.046 


0.051 


0.5 


h 


-0.001 


0.015 


0.056 


-0.001 


0.023 


0.054 


0.002 


0.038 


0.052 






-0.002 


0.032 


0.057 


-0.002 


0.054 


0.053 


0.001 


0.092 


0.049 




P 


0.001 


0.029 


0.050 


-0.001 


0.048 


0.052 


0.003 


0.082 


0.050 




$* 


-0.001 


0.017 


0.056 


-0.002 


0.037 


0.052 


0.002 


0.071 


0.048 



NOTE: Z is standard normal. The censoring time is exponentially distributed with hazard rate 
A c , where A c is chosen to achieve the desired censoring percentage. The sample size is 100. Each 
block is based on 10,000 replications. The random number generator of 16] is used. 

%{fi) = (1 - p)V{fi) + p-\l - P )Vcz((3), 
Vcz(P) = ^rr-^E / {Zi(t) ~ Z(M}® 2 tidbit) 

Z^i=l S« -1 Jo 



1 n f'OC 

=— - V / {Ztit) - Z(f3,t)} tidbit) 
L,i=i ?« Jo 



7:2 



Since we can estimate the optimal weight D* consistently by D* = (1 — p)V((3) x 
V r 2 _1 (/3), an "adaptive" estimator of Pq that achieves the lower variance-covariance 
bound £(£>*) may be constructed. Specifically, we can first use [3 from {S\{(3) = 0} 
to compute D* and then obtain the adaptive estimator by solving 

(2.7) S 1 (f3)+D*S 2 ((3)=0. 

Corollary 1. Let (3* be the estimator given by (2.7). Then under the same as- 
sumptions as Theorem 2.3, n 1 / 2 (/3* — 0o) — > A/"(0, E(£)*)). In addition, S(Z3*) can 
be consistently estimated by {pV(/3*) + (1 - p) 2 V0*)Vf 1 0*)V0*)\ . 

We have carried out extensive Monte Carlo experiments to investigate the finite- 
sample behaviour of the proposed adaptive estimator (3* and to compare it with the 
full-data estimator f3f, the complete-case estimator fid and the S\{(3) estimator (3. 
The key results are summarized in Tables 1 and 2. The biases of all four estimators 
and of their variance estimators (the latter not shown here) are negligible, and the 
associated Wald tests have proper sizes. The adaptive estimator is always more 
efficient than [3d and [3, as is reflected in the sampling variances of the estimators 
as well as in the powers of the Wald tests. The gains in the relative efficiencies in- 
crease as the missing probability increases and decrease as the censoring probability 
increases. The efficiency of (3* relative to $f is close to 1 when censoring is light. 
The estimator (3 seems to have slightly better small-sample efficiency than fid- 
Proof of Theorem 2.3. From its definition, —dS\{(3) / 'd(3 is, with probability 1, posi- 
tive definite. Thus, following [l| , we can show that (pn)^ 1 Si {(3) converges uniformly 
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Table 2 

Monte Carlo estimates for the sampling means and variances of four estimators of /3q and for 
the powers of the corresponding 0.05-level wald tests for testing ho : /3q = 
under the Model \(t\Z) = e°- 5Z 

20% Censoring 50% Censoring 70% Censoring 



p 


Estimator 


Mean 


Var. 


Power 


Mean 


Var. 


Power 


Mean 


Var. 


Power 


0.8 


Pf 


0.509 


0.017 


0.984 


0.511 


0.026 


0.912 


0.514 


0.041 


0.755 




fa 


0.511 


0.021 


0.956 


0.514 


0.033 


0.844 


0.518 


0.053 


0.655 




P 


0.510 


0.021 


0.955 


0.513 


0.033 


0.844 


0.516 


0.052 


0.653 




$* 


0.509 


0.018 


0.980 


0.512 


0.030 


0.876 


0.516 


0.049 


0.684 


0.5 


Pf 
fa 


0.509 


0.017 


0.984 


0.511 


0.026 


0.912 


0.514 


0.041 


0.755 




0.516 


0.038 


0.813 


0.522 


0.061 


0.624 


0.530 


0.102 


0.435 




P 


0.511 


0.035 


0.821 


0.514 


0.054 


0.642 


0.520 


0.088 


0.450 




P* 


0.509 


0.021 


0.960 


0.512 


0.042 


0.757 


0.518 


0.077 


0.514 



NOTE: Sec NOTE of Table 1. 



in any compact set to the nonrandom function 



m{/3) = E 



o 



{Z 1 (t)-z(p,t)}dN^(t) 



For S-2(0), we have, by the law of large numbers 

n r00 



"■ />oo 

n ^z2 {Zi{t)-Z((3,t)}{l-&dNi(t) 

n roo 

-n^V/ {Z i (t)-Z(p,t)}(l-p)dN i (t) 
i=i J ° 

= Op(l) + J o Z(p,t)dln- 1 J2^(tm^ P )\ 

= Op(l). 

where the last equality follows from the facts that sup t \n~ 1 J2?=i ^»(*)(& — P)\ = 
o p (n -1 / 4 ) and that the total variation of Z{f3, •) is at most O(logn) uniformly for 
(3 in any compact region. Thus the order o p (l) is also uniform. Continuing this line 
of arguments, we get 

n />oo 

n- 1 S 2 (p)=n- 1 J2 / {Zi-Z(/3,t)}(l-p)dK(t)+o p (l) 
= (1 - p)m(p) + o p (l) 

with the same uniformity. Thus n -1 {S\(/3) + DS2(/3)} is uniformly approximated 
by {pi +(1 — p)D}m{/3), which has a unique root /3 - Hence, (3 Pq. 

The asymptotic normality is easier to show now. Taking the Taylor series expan- 
sion of {Si(/3) + DS 2 0)} at /3 , we get 

n 1 / 2 ^ - p ) = [{pi + (1 - P)D} V}- 1 n- 1 ' 2 + DS 2 (p )} + o p (l), 

which, by Theorems 2.1 and 2.2 and a straightforward matrix manipulation, con- 
verges to the desired normal distribution. 
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To verify the optimality of D*, we note that the estimating function can be 
linearized around /?o and the limiting normal random vectors may be used in place 
oin- l / 2 S k {Po) (k = 1,2). Specifically we can consider the following "limiting" 
linear model 

St = P Vb + S* 01 , 

S* 2 = {l-p)Vb + S* 02 , 

where S^ k (k = 1, 2) are independent 7V(0, Vfc) (k = 1, 2). Recall that Vi = p^ and 
V 2 = (1 — p)(V + p~ 1 Vcz)- By the Gauss-Markov theorem, the best linear estimator 
is 

h* = { P v + (i - pfvv^vy 1 {S{ + (1 - P )vv 2 l s* 2 } 

with variancc-covariance matrix {pV + (1 — p) 2 VV 2 ~ 1 V} , which is exactly 
£(£>*). " " □ 

Lemma 1. (i) TTie process 

n" 1/2 Si2(t) = «- 1/2 V(6 - p) f - e^ (s) A (s)ds 

i=i - 70 

is iiyfti in 2?[0, oo) arwi is asymptotically equivalent to 

n-^ 2 S 12 (t) = n- 1 * 72 y7& - p) / - s)} e^ z ^X (s)ds 

<=i - 70 

m i/ie sense <fta< sup t n -1 / 2 |5i2 (i) — 5*12 (i)|| = o p (l). 

(ii) The process n" 1 / 2 £™ =1 (£ - p) J * |Z 4 (s) - Z(/3 , s)} dATf(s) is and 
asymptotically equivalent to 

«- 1/2 E&-') / {Zi(*)-*(A>,*)}dA?(«). 
i=i - 70 

Proof. Without loss of generality, assume p = 1. For any fi < t 2 , we have 
TinTS fn- 1/2 5*i 2 (t 2 ) - n~ 1/2 Si 2 (ti)| 

= ThrTn- 2 Y E [(& - p) f 2 (Z«(s) -Z(/3o,s)}e* z '^r,(s)A (s)ds 
™ ^ L Jh 

x & - P) C {Zj(s) - Z{0o,8)} e^ z ^Y j (s)X (s)ds 
Jti 

< (2K) 4 lh^n- 2 V^lfe -p) I' ' e^ Z ^Y t {s)X Q {s)ds 



x (&-p) [ \^ z ^Y j (s)X (s)d i 

Jt! 



{2Kf{p{l-p)Y 



E 



Yi(s)e /3oZl(s) A (s)ds 
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Since /i[fi,t 2 ] = E j J t * 2 Yi(s)e ,3aZl ^ X (s)dsj is a finite measure on [0, oo), the 

moment criterion (pjj, page 52, formula (30)) implies the tightness of n~ 1 / 2 S\2- 
Likewise, n _1 / 2 5i2 is also tight. Furthermore, let to = inf{i : EY\(t) — 0}. Then it is 
easy to see that sup s<t \Z(f3o,s) — z(f3 , s)\ — ► for any t < t . Thus the equivalence 
of n^ x / 2 S\2 and n _1 / 2 iSi2 follows from the tightness just proved. Hence (i) holds. 

The proof of (ii) is very much the same as that of (i). Because of possible dis- 
continuity of ENf(i) in t, another moment condition (jl2j, page 51, formula (25)) 
should be used. Note that the tightness continues to hold even if the measure /i 
there is discontinuous. □ 



3. Cumulative hazard function estimation under Type I missingness 

In this section, we first deal with the problem of nonparametric estimation of the 
cumulative hazard function for a homogeneous population under Type I missing- 
ness. We shall discuss the estimators proposed in [10( and give our own solutions. 
We then apply the ideas to the estimation of the cumulative baseline hazard func- 
tion for the Cox model. In both cases, asymptotic distributions of the relevant 
estimators are derived. 

In the one-sample case, the observations consist of i.i.d. random vectors (Xj,£,-, 
{i = 1) ■■■■>'n>), where Xi = Ti A C;, Si = l(r i <o 1 ) an d & is the missing 
indicator independent of (Xi, 5i). Assume that Ti is independent of Ci and that Ti 
has a continuous distribution function. Let F(t) — P(T\ < t), A(i) = f Q dF(s) /{l — 
F(s)}, G{t) = P(d < t), A G (t) = Jj dG{s)/{\ - G(s-)}, H{t) = {I - F(t)}{l - 

G(t-)}, A(t) = J Q t dA(s)/H(s) and A G (t) = / p *dG(«)/{(l - G(s-))H(s)}. The 
notation for Yi,Ni,N", ATP, p, p, etc. introduced in Section 2 will also be used. 

Under the setup described above, [Toj ] shows that the nonparametric maximum 
likelihood method typically does not yield a consistent estimator for F, indicating 
that this is far more complicated than the complete-data situation. Two alternative 
estimators, F\ and Fg, are also proposed there. It can be shown, by expanding 
log(l — -Fa), that Fa is not a consistent estimator; in particular, Theorem 3 of [l0( 
is not valid. In our notation, the second estimator is given by 

Motivated by (3.1), we modify (1.3) to obtain the following estimator for A(t): 

(3. 2 ) hm r' zum , 

Jo pE i= i^i( s ) 

By the exponentiation formula of Doleans-Dade ([l[, p. 897), the corresponding 
estimator for F(t) is 



(3.3) F x (t) = 1 




pHUY^Xi) 



It is easily seen that F\ and Fb are asymptotically equivalent; however, the cu- 
mulative hazard function approach is more convenient for our later developments. 
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Expression (3.2) also reveals that Ai (and hence Fb and F\) does not utilize the 
counting process information from the subjects with & = 0. To recover this infor- 
mation, we introduce 



(3.4) A 2 (t) 



(i-p)E2=i«W 



which shares the same spirit as estimating function S*2(/3) given in (2.3). Thus, A(i) 
can be estimated by 

(3.5) A(a,t)=aAi(t) + (l-a)A 2 (t), 
where a G [0,1]. 

Theorem 3.1. Let t < tf-^O). Tften n 1 / 2 {A(a, •) - A(-)} converges weakly in 
V[0,t o ] to a zero-mean Gaussian process with covariance function 

(3.6) r Q (t, i') - j {A(t A t') - (1 - p)A(i)A(t')} 

+a(l - a) {2A(t)A(t') + p^A^A^') + p^A^A^t)} 

+ (1 1 ~^ )2 [A(t A i') + p-'Acit A t') 

-p (A(t) + p^Actf)} {A(f ) + p"^')}] • 

For fixed t, n 1 / 2 |A(a, i) - A(t)j -i AT(0, T a (i)), w/iere 

(3.7) r a (t) - ^ {A{t) - (1 - p)A 2 (t)} + 2a(l - a) |A 2 (t) + p" 1 A(t)A G (t)} 

+ ( \l" )2 [^(*) + P _1 ^G(t) - P { A(t) + p-UcW} 21 

which reaches its minimum when a equals 

P {A(t) - A 2 (t)} + Aq it) - A G (t) - (1 + p)A(t)A G (t) 



a = 



A{t) - A 2 (t) + A G (t) - AUt) - 2A(t)A G (t) 



Remarks. (1) If we choose a n a, then A(a n , •) has the same asymptotic distri- 
bution as A(a, •). Since a* can be estimated consistently, the "optimal" estimator 
of A can be constructed adaptively. To be specific, a* may be estimated by 

p{Mt) - A?(t)} + A G {t) - A G (t) - (1 + p)Ai(t)A G (t) 
~ Mt) - A?(i) + A G (t) - A G (t) - 2A 1 (i)A G (i) ' 

where Ai(t) = n f* dAi(s)/ Y^j=i ^'( s )> an d ^g(0 an d Ac(i) arc the obvious 
analogs of Ai(£) and A\(t). 

(2) A consistent estimator for T a (t,t') may be obtained by replacing p, A, A, 
A G and A G in (3.6) by p, A\, Ai, A G and A G . 

(3) Two special cases deserve extra attention. If a = 1, then A(a, i) reduces 
to Ai(t). In that case, the asymptotic variance Ti(t) = p -1 |A(t) — (1 — p)A 2 (t)}, 
which agrees with Lo's result when the exponentiation is taken into account, and 
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Table 3 

Simulation summary statistics for the adaptive estimator F(a*, t) at t = F~ 1 (0.5) under the 

exponential model F(t) = 1 — e~* 







20% Censoring 


50% Censoring 


70% Censoring 


p = 0.8 


p = 0.5 


p = 0.8 


p = 0.5 


p = 0.8 


p = 0.5 


Mean of a* 




0.84 


0.60 


0.90 


0.75 


0.94 


0.85 


Mean of F(a*,t) 




0.497 


0.497 


0.496 


0.495 


0.493 


0.490 


Var of n 1 / 2 F(a*,i) 




0.284 


0.325 


0.419 


0.566 


0.786 


1.161 


Mean of V p (a* , t) 




0.283 


0.323 


0.413 


0.547 


0.747 


1.039 


Var of F d (t) / Var of F(a* 


,*) 


1.21 


1.72 


1.14 


1.36 


1.12 


NA 


Var of F B (t) / Var of F(& 


V) 


1.10 


1.33 


1.06 


1.13 


1.06 


1.08 



NOTE: The censoring time is exponential. The sample size n = 100. Each block is based on 
10,000 replications. Vp(a* ,t) is the variance estimator for n 1 / 2 F(a* ,t), which is F 2 (a*,t) 
multiplied by the estimator for F a * (t, t) mentioned in Remark (2) of Theorem 3.1. F d (t) is the 
estimator based on complete cases only and Fs{t) is Lo's second estimator. "Mean" and "Var" 
refer to the sampling mean and variance. NA indicates that the result for the complctc-casc 

estimator is not obtainable. 



which is less than p 1 A(t), the variance of the complete-case estimator. On the 
other hand, if we let a = p 7 then 

with asymptotic variance T p (t) = A(t) + /0 _1 (1 - p) {A G (t) - A G (t)} . Clearly, 
r p (t) <Ti(t) if and only if A G (t)-A%(t) < A(t)-A 2 (t). Note that A G {t)-k G (t) = 

Var{/ *rfAT c (s)/fl'(s)} and A(t) - A 2 (t) = Var {/* dN u (s)/H(s)Y 

(4) Let pi 1, i.e., the proportion of missing Si's shrinks to 0. Then a* — > 1. The 
resulting estimator is Ai(t). On the other hand, if the censorship shrinks to 0, which 
entails A G (t) — ► and A G (t) — > 0, then a* — > p, which was the case discussed in 
the previous remark. 

Table 3 displays the main results from our Monte Carlo studies on the adaptive 
estimator F(a*,t) = 1 — e -A ( Q The biases of the adaptive estimator and its 
variance estimator are small. The efficiency improvements of the adaptive estimator 
over the complete-case analysis and (to a lesser extent) over estimator (3.1) are 
impressive, especially for light censoring and substantial missingness. 



A(a,t) 



/ 

Jo 



U=i (WW + (i 



Proof of Theorem 3.1. In analogy with the approximations given in Lemma 1, we 
can show that 

i=l J 

= Li(i) + o p (n~i), say, 
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and 



1 dNf{s) 



1=1 i=i 

= L 2 (t) +o p (n^^), say. 



H(s) 



6 ~P 



Y t (s)dA(s) 
H(s) 



Ki - P) 



Thus to characterize the limiting distribution of A(a, •), it suffices to derive the 
covariance functions E {Lj(t)Lk(t')} (j,k — 1,2). Through some tedious, but oth- 
erwise routine calculations, we obtain 



(3.10) EiL^L^t')} - (np)- 1 {A(t A t') - (1 - p)A(t)A(t')} . 



(3.11) E {L^L^t')} - « _1 (A(t)A(t') + p^A^A^')} , 

(3.12) E {L 2 (t)L 2 (t')} = {n(l - p)} -1 A t') + p" 1 ^ A t') 

-p (A(t) + p-'Acit)} {A(f ) + p^A^f )}] . 

From (3.10)-(3.12), we can evaluate n£J [{aii(i) + (1 - a)L 2 (t)} {aLi(i') + 
(1 — a)L2 (*')}] to get the desired covariance function. □ 

We now return to the regression model studied in Section 2. Let [3 be as defined 
by (2.5). To estimate the cumulative baseline hazard function A , it is natural 
to extend the class of estimators given in (3.5). To avoid complicated asymptotic 
variances, we shall only consider a = 1 and a = p, the two special cases discussed 
in Remarks (3) and (4) following Theorem 3.1. The two estimators for A (t) are 
given below 



(3.13) A x (/?,*) = 



o pEtiYi(s)eP' z ^y 



(3.14) A 2 0,t)= f 
Jo 



EllifecWf (*) + (i - W) - r_q - fl&rfivf 00} 

Er=i^(*)^ ,z 'W 



Theorem 3.2. Suppose that the assumptions of Theorem 2.3 are satisfied. Let 
t > be any number such that EYi(t ) > 0. 

(i) The process n 1 / 2 j Ai(/3, •) — A (-)| converges weakly in V[0,to] to a zero- 
mean Gaussian process with covariance function 



rtAt' ji / \ 

(3.15) f 1 (t,t')=p- 1 -^U - p -^ {1 - p )A (t)A (t') 
Jo tl z{s) 

+a'(t)Z(D)a(t') - p-\l - p) [a'(t)SlE {N? z (oc)} A (t') 
+ a'(t')flE{N^ z (oo)}A Q (t)] , 
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where H z (s) = E {Yi(s)e#> z i(s) }, a (t) = f* z(f3 , s)dA (s) and fl = {pV + (1 - 
p)DV}- 1 D. For fixed t, n 1 ' 2 {Ai(/3, t) - A (t)} A/"(0, fi(t)), where 

(3.16) f!(t) = p- 1 /' ^4 _ _ p )A 2(t) + '(t)S(i?)a(t) 

Jo 

-2 (0 - 1 (l - p)a'(t)flE {JVf z (oo)} Ao(t). 

(ii) TTie process n 1 / 2 {A 2 (/3, •) — A (-)} converges weakly to a zero-mean Gaussian 
process with covariance function 

(3.17) 

f 2 (M')= P ^^+p- 1 (l-p)Cov{iV 1 OT W,7V 1 OT (0}+a'(t)S(^)a(0 
Jo U Z(S) 

-p-\l p) (a'{t)QE [N? z {oo) {N? H {t>) - EN? H {t')}} 

+a'{t')VLE [JVf z (oo) {N^ H {t) - EAf^i)}]^ , 

w/iere Af H (t) = /„* dN, c (s)/H z (s) (i = l,...,n). For fixed t, n 1 ' 2 {A 2 0,t) - 
A (t)} SjV(0,r 2 (t)), where 

(3.18) f 2 (t) = ^4 + ^ - -°) Var + <*'(*)£(£>)«(*) 

Jo -^M 5 ) 

-2p" 1 (l - p)a'(t)QE [N? z (oo) (JVf H (t) - £Af H (i)}] . 

Remarks. (1) Consistent estimators for variances Ti(t) and T 2 (t) may be obtained 
in a straightforward manner. For example, let a(t) = jl Z0, s)dA\0, s) and Cl — 

jpV +(1 — p)VD^ D. Then a consistent estimator for fi(t) is 



-2p _1 (l -p)a'{t)Cl 



E 



^T.H {Zi{8)-z0,s)}z, 



dN?(s) 



AiOM, 



where S(-D) is the consistent estimator given in Remark (2) following Theorem 2.3. 

(2) If D = 0, then the last term on the right hand side of (3.16) disappears and 
the sum of the first and the third terms becomes the variance of the complete-case 
estimator. Thus, the use of Ai(/3, t) reduces the variance by p _1 (l — p)Af l (t). 

Proof of Theorem 3.2. Taking the Taylor expansions at /3 , we get, for I — 1, 2, 

(3.19) A ; 0M = MA),i)- f Z'(/3 Q ,s)dA l (p o ,s)0-l3 o ) + o p (n--i). 

Jo 



By the approximations given in the proofs of Theorems 2.1 and 2.2, we can express 
$ — /3o approximately as a sum of n i.i.d. random vectors. Furthermore, similar 
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(3.20) Ai(A),*)-Ao(*) = (np)" 



1 f' EWM 



Jo 



i f'Ete-^W 



Ao(s)ds 



n 
i=l 

= Ji(*) + J 2 (*) + J 3 (t) + o p (n-5), say, 

where f,(s) = Y^e^C 8 ). Let 5 fc = S kl {oo) + S fe2 (oo) (fc = 1,2), where S kj are 
defined in the proofs of Theorems 2.1 and 2.2. Then E </ 3 (i) j = and 



E- 



{5i Ji(t)} = (1-p)eU Wi(a)ri(s)dA (a) 



tAs dMi(«) 



-{l-p)E / l^Wi^yiWfr^^uJYiHdAoWdAoH. 
Jo Jo 



Moreover, we can show that £ J 2 (i)j = Ji(i) j. Therefore 



(3.21) 



Si{Ji(t) + Mt) + J 3 {t)}] =o. 



Likewise, we can show that E 



5 2 i(oo){Ji(t) + J 2 (t) + J 3 (*)} 



= 0. Thus 



(3.22) 



S 2 {Ji(t) + J 2 (i) + J 3 (i)} 



S 22 (oo){J!(i) + J 2 (t) + J 3 (i)} 



= -p- 1 (l-p)£ 
= p- 1 (l-p)i?7V 1 cz (oo)Ao(t). 
From (3.21) and (3.22), 

E 



{N? z {oo) -EN? Z {oo)} [ 

Jo 



* dN?(s) 



(3.23) 



{& + £S 2 } {Ji(t) + 7 2 (t) + J 3 (t)} 
= p- x {l - P )DE {JVf^oo)} A (t). 



It is also not difficult to show that 



£ [{Ji(t) + ^ 2 (t) + Mt)} Ui(t') + J 2 (t r ) + J 3 (f)}] 



^ /-tAt' 

«P Jo 



dAg(s) _ 1-/Q 
ffz(s) np 



Ao(t)Ao(t'), 



which, combined with (3.19), (3.20) and (3.23), yields the desired covariance func- 
tion r\. 
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For (ii), first note that 
A 2 (A>,t)-Ao(*) 



(3.24) 




- (np)- 1 £ [N? H {t) E {Nf B {t)}] (6 - p) + ^(n- 1 / 2 ). 

From (3.24), wc can show that jA 2 (/3 ,i) — A (t) j is asymptotically uncorrelated 
with S\ and S^i- The desired covariance formula (3.17) then follows by evaluating 
the asymptotic covariance between jA 2 (/3 , t) — A (i)j and S 22 . The details arc 
omitted. □ 



4. Cox regression and cumulative hazard function estimation under 
Type II missingness 

We now describe in detail the problem of Type II missingness mentioned in Sec- 
tion 1 using a slightly different notation. Let (T^\T^ 2 \Ci, Z[) (i = 1, ...,n) be 
i.i.d. random vectors, where and T- 2) denote two types of latent failure times, 
of which the first is of interest, and Cj and Zi denote the censoring time and co- 
variate vector as before. Suppose that, conditional on Zi, the failure time is 
independent of and Cj, and has the hazard rate A(t | Zi) = e l3 " Zi Xo(t). Define 
Ti = Tf } A if \ fa = 1 (T ( 1) < T (2) ) , X t = T t A a and S { = l {Ti < Ci )- Note that 
<j>i8i indicates, by the value 1 vs. 0, whether or not the observation time Xi is the 
failure time of interest jf \ In the standard competing risk setup, one observes 
(Xi,Si,(f>iSi, Zi) for every i. With incomplete measurements on the failure types, 
however, the data consist of (Xi, Si, ^ifaSi, Zi) (i = 1, . . . , n), where & indicates, 
by the value 1 vs 0, whether fa is known or unknown. We assume that is inde- 
pendent of all other variables with P(£i = 1 | Xi, 6i, fa, Zi) = r. This has the same 
level of generality as assuming = 1 | X i} Si = 1, fa, Zi) = r, since for Si = the 
value & does not have any effect on the observations and can therefore be redefined 
to make the independence true. We define Nf, Yi, Z and z as in Section 2. 
In the absence of missing values, the partial likelihood score function for /3 is 

s *(p) = E / - 2 (W) <t>i dN ?(t)- 

By deleting all the cases with {Si = 1, =0}, the complete-case estimating function 



is 



^(/?) = E / 



Zi(t) - 



J2U + (1 - 5 j )}Y j {t)e^' z ^Z ] (t) 
ELi {S^ + (l-S,)}Y,(t)eP'^) 



ZifadN?(t). 



Because the index set {j : (Sj£j + (1 — Sj))Yj(t) = 1} is not a random subset of 
the risk set {j : Yj(t) — 1}, the complete-case analysis does not yield a consistent 
estimator for f} . We shall use the ideas presented in Section 2 to estimate flo under 
Type II missingness. 
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The analogs of estimating functions S\ (/?) and S2 {(3) studied in Section 2 are 

sfw = E / W*) - 2 (W) ^ dN ?(t), 

5 2(/3)=E / {^-^.^{(l-^-r-^l-f^a-^)}^*), 

where f = ^™=i X)"=i We have the following results for Sf((3o) (k = 1,2), 
which are similar to those of Sk(Po) (k — 1, 2) given in Theorems 2.1 and 2.2. 



Theorem 4.1. The random vector n 1/>2 
mean normal with covariance matrix 



st(f3 y,st(p y 



is asymptotically zero- 



V? 
V* 

where V? = tV* , VJ? = {1-t)V* +T- 1 {l-r)E{Nf ~ENf )® 2 , N? = {Z t (t) - 
*(#),*)} andV* = E[f™ {Z^t) - z(/3 ,t)}® 2 fadNfit) . 

In analogy with (2.5) for f3, we define [3^ as a solution to 

Sf(f3) + DSi{(3) = 0, 

where I? is a given pxp matrix. Then the following theorem similar to Theorem 2.3 
holds. 

Theorem 4.2. Suppose that {tV* + (1 - t)DV*} is nonsmgular. Then n 1 / 2 ^- 
Po) ^ JV(0,E*(D)), where 

E*(D) = {tV* + (1 - r)L»y^} _1 (tV* + W 2 L>') {tV^ + (1 - r^D'}" 1 . 

The optimal choice for D is D* = (1 — t)V^(V2 ) _1 , m w/iic/i case 

5^(£>*) = {rT^ + (1 - r) 2 ^^/)" 1 ^} -1 . 

Proof of Theorem 4-1- As in the proofs of Theorems 2.1 and 2.2, we can define the 
martingales Mf{t) = faN^t) - /^.(sje* 2 -' 5 ' X (s)ds (i = l,...,n) and derive 
the following key approximations 

n „oc 

Sf(Po) = E / - iidMf{t) 

+ Y, / {Z l (t)-z((3 ,t)}^ l -T)Y t (t)e^ z ^Xo(t)dt + o p (n^), 

st(fo) = E / - (i - 

i=l ^° 

- V / {Zi{t) - z(fo,t)} & - ^Y^'^M^dt 

n 

-r-^{Nt-ENtm-T) + o p {n^). 
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These two approximations can be used to show, through some tedious calcula- 
tions, that the asymptotic variance-covariance matrix of „-V» [sf( M ', S^oY ' 

is Diag | vf, Vjfj. Hence, the theorem follows from the multivariate central limit 
theorem. □ 

Proof of Theorem 4-2. This can be done by applying Theorem 4.1 and the argu- 
ments given in the proof of Theorem 2.3. □ 

Using (3^ with its asymptotic distribution given by Theorem 4.2, we can construct 
consistent estimators for the cumulative baseline hazard function Ao(i). Two such 
estimators which correspond to A\(f3,t) and A 2 ($,t) defined by (3.13) and (3.14) 
are 

£2=1 & WW 



A^,t)= f 
Jo 



TZ=i fcfr + a - 6) - - - go} rfjv rM 



The kind of asymptotic properties given in Theorem 3.2 for Afc(/3,i) (fc = 1,2) 
also hold for A.t(f3^,t) (k = 1,2). They can be derived from the arguments used 
in the proof of Theorem 3.2. To simplify the statements, we shall only present the 
asymptotic normality part although the weak convergence also holds. 

Theorem 4.3. Suppose that the assumptions of Theorem 4-2 hold and that t satis- 
fies EY x {t) > 0. Then n 1 / 2 |A£(/H*) - A (i)} -i J\T(0, «^ jfc (t)) (k = 1,2), where, 
with H z (t) and a(t) as defined in Theorem 3.2, Q,^ = {tV^ + (1 - ^DV^y 1 D 
and Nf H {t) = / *(1 - ( j )i )dNf{s)/H z {s) (i = 1, . . . , n), 

<i (*) = ^ /* #TT - r_1 d - -) A o (*) + a'(t)S*(I>)a(t) 
Jo H z{s) 

-2t _1 (1 - T)a'(t)fl 4 'ENfAo(t), 



(*) = J* + r-\l - r)Var [n* H (t)} + a'(t)E(I>)a(t) 

: (1 - r)a'(t)^£ [jvf (t) - £A^ ff (t)} 



-2r" 

For the one-sample case, where the data consist of i.i.d. random vectors (JQ, 
£i<j>i8i) (i = 1, • • • ! n), we modify (3.2), (3.4) and (3.5) to obtain the following class 
of consistent estimators for the cumulative hazard function of T- 1 ^: 



A*(a,t) 



+ (l-a) 



rEr=i^w 



/ 

Jo 



(l-r)Er=i^iW 



The arguments given in the proof of Theorem 3.1 can be used to verify the following 
asymptotic normality for A^(a, t). 
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Theorem 4.4. Fort satisfying EY^t) > 0, n 1 / 2 {A^(a,t) -A(t)} -i A/"(0, cr 2 (a)) , 
where 

a 2 (a) = ^ {A(t) - (1 - r)A 2 (t)} + 2a(l - a) {A 2 (i) + t' 1 A(t)A Q (t)} 
A(t) + r- l A Q (t) - r {A(t) + r^Agft)} 2 ' , 

where A(t) = L* dA(s)/EYi(s), Aq(£) — oIAq(s) / EYi(s) and Aq is the cumula- 
tive hazard function of Tj® . In particular, 

a 2 (l) = r- 1 {A(t)-(l-r)A 2 (t)}, 

a 2 (r) = A(t) + r-\l r) {A Q {t) A 2 Q (t)} . 

The variance <r 2 (a) is minimized when a equals 

. _ r {A(t) A 2 (t)} + A Q (t) A 2 Q {t) (1 + r)A(t)A Q (t) 
A(t) - A 2 (t) + A Q (t) - Aq - 2A(*)Aq (t) 

5. Discussions 

We did not provide all the details for Type II missingness in Section 4 because of the 
similarity with Type I missingness. It should be noted that consistent estimators for 
the variance quantities such as T,^(D), a 2 ^ k (t) (k = 1, 2) and a 2 (a) can be obtained 
in the same manners as their counterparts in Sections 2 and 3. Furthermore, the 
asymptotic approximations under Type II missingness are expected to have similar 
degrees of accuracy in finite samples as those of Type I missingness. 

We have made the missing completely at random assumption in our develop- 
ments. This assumption consists of two parts, the first part being the independence 
between and (JQ, Si, <fii, Zj) for every i and the second being the i.i.d property of 
£i (i = 1, . . . ,n). The first part of the assumption cannot be avoided without direct 
modeling the missing processes. The second part can be relaxed to the extent that 
a consistent estimator for P(£, = 1) is available for every i. For example, in a multi- 
institutional study, it may be reasonable to assume that the missing probabilities 
are constant within the same institution but vary among different institutions. In 
this case, we may stratify our data on the institutions and modify the methods 
described in the previous sections to incorporate the stratification factor. 

In many applications, the measurements on the covariate vectors are incomplete. 
Q and subsequent papers provide solutions to this problem. It is possible to combine 
the techniques developed in this paper with those of Q to handle the situation where 
both the covariates and the failure indicators are partially measured. The details 
will not be presented here. 
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